Trading accuracy for faster entity linking

نویسندگان

  • Kristy Hughes
  • Joel Nothman
  • James R. Curran
چکیده

Named entity linking (NEL) can be applied to documents such as financial reports, web pages and news articles, but state of the art disambiguation techniques are currently too slow for web-scale applications because of a high complexity with respect to the number of candidates. In this paper, we accelerate NEL by taking two successful disambiguation features (popularity and context comparability) and use them to reduce the number of candidates before further disambiguation takes place. Popularity is measured by in-link score, and context similarity is measured by locality sensitive hashing. We present a novel approach to locality sensitive hashing which embeds the projection matrix into a smaller array and extracts columns of the projection matrix using feature hashing, resulting in a lowmemory approximation. We run the linker on a test set in 63% of the baseline time with an accuracy loss of 0.72%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Effect of Transitive Closure on the Calibration of Logistic Regression for Entity Resolution

This paper describes a series of experiments in using logistic regression machine learning as a method for entity resolution. From these experiments the authors concluded that when a supervised ML algorithm is trained to classify a pair of entity references as linked or not linked pair, the evaluation of the model’s performance should take into account the transitive closure of its pairwise lin...

متن کامل

Estimating the Parameters for Linking Unstandardized References with the Matrix Comparator

This paper discusses recent research on methods for estimating configuration parameters for the Matrix Comparator used for linking unstandardized or heterogeneously standardized references. The matrix comparator computes the aggregate similarity between the tokens (words) in a pair of references. The two most critical parameters for the matrix comparator for obtaining the best linking results a...

متن کامل

Computerized Linking of Capital Markets - A Viable Approach

Interlinking capital markets has always been an interesting issue since it not only provides more investment opportunities but also results in reduction of the risk of market volatility due to increase in the size of market. However, global and local barriers like different currencies, legal issues, settlement risks and costs prevent such interlink age to take place efficiently. In this paper, ...

متن کامل

UBC Entity Linking at TAC-KBP 2013: random forests for high accuracy

This paper describe our systems and different runs submitted for the Entity Linking task at TAC-KBP 2013. We developed two systems, one is a generative entity linking model and the other is a supervised system reusing the scores of the previous model using random forests. Our main research interest is Named Entity Disambiguation task and we thus performed a very naive clustering of NIL instance...

متن کامل

Faster (and Better) Entity Linking with Cascades

Entity linking requires ranking thousands of candidates for each query, a time consuming process and a challenge for large scale linking. Many systems rely on prediction cascades to efficiently rank candidates. However, the design of these cascades often requires manual decisions about pruning and feature use, limiting the effectiveness of cascades. We present Slinky, a modular, flexible, fast ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014